STPLNR is a package for sustainable transport planning with R Studio. It provides functions for solving common problems in transport planning and modelling, such as how to best get from point A to point B. The overall aim is to provide a reproducible, transparent and accessible toolkit to help people better understand transport systems and inform policy in transportation planning. The purpose of this assignment is to develop a series of analyses that turn “raw” origin-destination data into meaningful results. Our study area will be University City, and the analyses will describe worker flows from the greater Philadelphia region into University City. The census tracts identified for the University City area are: 42101980000, 42101009000, 42101036900, 42101008801, 42101008802. This assignment also includes exploring the flow of traffic through desire lines and how metrics like employment and income may affect transit patterns in this area of the city.
rm(list=ls())
library(tidyverse)
library(sf)
library(RSocrata)
library(viridis)
library(spatstat)
library(raster)
library(spdep)
library(FNN)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(knitr)
library(kableExtra)
library(tidycensus)
library(classInt)
library(kableExtra)
library(RColorBrewer)
library(hrbrthemes)
library(ggthemes)
library(ggplot2)
library(plotly)
library(scales)
library(extrafont)
library(basetheme)
library(RColorBrewer)
library(lemon)
library(data.table)
library(GGally)
require(gridExtra)
library(ggalt)
library(rgeos)
library(sp)
library(smoothr)
library(rgdal)
library(rgeos)
library(maptools)
library(ggmap)
library(scales)
library(ggeasy)
library("wesanderson")
library(spdep)
library(caret)
library(ckanr)
library(FNN)
library(grid)
library(ggcorrplot)
library(corrr)
library(kableExtra)
library(jtools)
library(ggstance)
library(ggpubr)
library(broom.mixed)
library(tab)
library(plotly)
library(stargazer)
library(pscl)
library(plotROC)
library(pROC)
library(lubridate)
library(sf)
library(lubridate)
library(tigris)
library(riem)
library(gridExtra)
library(knitr)
library(kableExtra)
library(vtable)
library(RSocrata)
library(stplanr)
library(leaflet)
col <- wes_palette("Moonrise2")
col5 <- colorRampPalette(col)(5)
col6 <- colorRampPalette(col)(6)
col7 <- colorRampPalette(col)(7)
col8 <- colorRampPalette(col)(8)
col10 <- colorRampPalette(col)(10)
col22 <- colorRampPalette(col)(22)
col29 <- colorRampPalette(col)(29)
col96 <- colorRampPalette(col)(96)
options(scipen = 999)
options(tigris_class = "sf")
root.dir = "https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/DATA/"
We started creating the markdown by installing the required packages
and loading them into the library. Some of the mainly used packages
arestplnr, tidyverse, tidycensus,
and dplyr. The final deliverable for this homework is an
RMarkdown file with the required plots and tables, knitted to a HTML
document.
The map below shows the centroids of each county in the state of
Pennsylvania. We used the pa_tract_geo.txt file, which
contained data on the census tracts in PA as well as the coordinates of
their centroids.
OD <- read.csv("~/STUDY/Coursework/Fall 2022/Planning_Methods/stplanr_export/pa_od_2017.csv",
colClasses = c("h_geocode" = "character", "w_geocode" = "character")) %>%
rename(trct_org = h_geocode,
trct_des = w_geocode)
OD<- OD %>% group_by(trct_org, trct_des) %>% summarise(n_com = n())
tract <- fread("~/STUDY/Coursework/Fall 2022/Planning_Methods/stplanr_export/pa_tract_geo.txt", select = c("GEOID", "INTPTLAT", "INTPTLONG"), colClasses = c("GEOID" = "character"))
tract <- st_as_sf(tract, coords = c("INTPTLONG", "INTPTLAT"), crs = 4326, agr = "constant")
ggplot()+
geom_sf(data = tract, color="#e86f00", size=1.5)+
labs(title="Centroid for Census Tracts",
subtitle = "Pennsylvania, 2017",
caption = "Longitudinal Employer Household Dyanamics Origin-Destination Employment Statistics (LODES)")+
theme_void()
After taking a look at the geography we were working with, we then
proceeded to narrow our focus down to university city. We used the
Philadelphia 2035 plan from the Philadelphia City
Planning Commission (PCPC) to decide what census tracts would be
included in the University City region. The fips codes for these tracts
were: 42101980000, 42101009000, 42101036900, 42101008801,
42101008802. We cleaned the GEOID columns to
make it similar across the data sets we were working with, and then
filtered the column trct_des (destination tracts), to
include Univeristy City. This meant that we were looking at trips that
originated in the Greater Philadelphia Region and ended in University
City.
phl<-st_read("https://opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson")
ggplot()+
geom_sf(data = phl)+
labs(title="Philadelphia",
subtitle = "Pennsylvania",
caption = "Open Data Philly")+
theme_void()
This section is where we take a look at desire lines that originate
in the Greater Philadelphia Region, with its destinations in University
City. We re-projected the desire lines to the projection that the
Philadelphia data set was in and then proceeded to visualize it. We can
observe that there is a lot of clustering and overlap of the desire
lines with destinations in University City, which can be attributed to
the number of educational, medical and commercial establishments in the
area. To make the visualization a little easier, we have included an
interaction map through the leaflet package that enables
you to zoom in and view the O-D pair patterns more closely.
OD<- OD %>% separate(trct_org, into = c('trct_org', 'B'), sep = -4, convert = TRUE) %>%
separate(trct_des, into = c('trct_des', 'C'), sep = -4, convert = TRUE) %>%
select(c(trct_des, trct_org,n_com))
OD <- OD %>% filter(trct_des== "42101980000" | trct_des == "42101009000"
| trct_des == "42101036900" | trct_des == "42101008801"
| trct_des == "42101008802")
dat_red<- OD[OD$trct_org %in% tract$GEOID & OD$trct_des %in% tract$GEOID, ]
dat_dsire<- od2line(flow = dat_red, zones = tract)
map1 <- leaflet() %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(data = dat_dsire, color = "red",weight = 0.3, fillOpacity = 0.5)
map1
desire<- st_as_sf(dat_dsire)
map2 <- leaflet() %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(data = desire, color = "green",weight = 0.3, fillOpacity = 0.5)
map2
desire$dist_units<- st_length(desire)
desire$dist<- as.numeric(st_length(desire))
desire_proj<- st_transform(desire, crs = 2272)
desire_proj$dist_ft<- st_length(desire_proj)
In this section, we looked at different variables we would like to consider in this analysis. We picked variables that we thought would have an effect on O-D travel patterns throughout the city. Each set of variables is divided into 3 categories: binary variables, continuous variables, and categorical variables. Each category of variables will have a table of summary statictics provided in this section.
# Binary Variables
desire$iphone<-sample(c("Yes", "No"), nrow(desire), replace = TRUE)
desire$vehicle<-sample(c("Yes", "No"), nrow(desire), replace = TRUE)
# Continuous Variable s
desire$n_com<- as.numeric(desire$n_com)
x <- 30000:80000
desire$medinc<-sample(x, nrow(desire), replace = TRUE)
y <- 10:70
desire$pctwork<-sample(y, nrow(desire), replace = TRUE)
# Categorical Variables
desire$edu<-sample(c("High school", "Bachelors", "Masters", "PhD"), nrow(desire), replace = TRUE)
desire$transit<-sample(c("Walk", "Bike", "Car-Pool", "Private Vehicle", "Public Transit"), nrow(desire), replace = TRUE)
table(desire$iphone)
The binary variables we have chosen are whether the commuter uses an iPhone or not and whether the commuter uses public transit to get to the University City destination. Binary variables are characterized by a (1,0) value, or in this case- a (Yes, No) value.
st(desire, vars = c('iphone','vehicle'),title = 'Binary Variables')
| Variable | N | Percent |
|---|---|---|
| iphone | 38168 | |
| … No | 19032 | 49.9% |
| … Yes | 19136 | 50.1% |
| vehicle | 38168 | |
| … No | 19121 | 50.1% |
| … Yes | 19047 | 49.9% |
We decided to look at 3 continuous variables for this section- which included median household income, number of commuters and percent of people employed in the census tract that we are looking at. These continuous variables focus primarily on work and employment, as a large number of commuters could be travelling into University City for work purposes. These variables could be beneficial in looking at how indicators like income and employment affect O-D demand.
st(desire, vars = c('medinc','n_com','pctwork'), title='Continuous Variables')
| Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
|---|---|---|---|---|---|---|---|
| medinc | 38168 | 54905.33 | 14413.247 | 30008 | 42383.75 | 67357 | 80000 |
| n_com | 38168 | 1 | 0 | 1 | 1 | 1 | 1 |
| pctwork | 38168 | 39.975 | 17.629 | 10 | 25 | 55 | 70 |
The two categorical variables we have chose are the level of educational attainment and the mode of commute. These variables are more individual specific and could be useful in discerning patterns of travel to and from the area of study.
st(desire, vars = c('edu','transit'), title='Categorical Variables')
| Variable | N | Percent |
|---|---|---|
| edu | 38168 | |
| … Bachelors | 9557 | 25% |
| … High school | 9613 | 25.2% |
| … Masters | 9492 | 24.9% |
| … PhD | 9506 | 24.9% |
| transit | 38168 | |
| … Bike | 7611 | 19.9% |
| … Car-Pool | 7664 | 20.1% |
| … Private Vehicle | 7524 | 19.7% |
| … Public Transit | 7664 | 20.1% |
| … Walk | 7705 | 20.2% |
In the maps below we can see that the desire lines for median income, number of commuters and percentage of people employed seem to be fairly evenly distributed. We cannot be sure about the clustering of high and low values, because the variable values were assigned to each census tract at random, which does take away a little bit of nuance. There does seem to a higher number of O-D trips coming from the north-west region and some trips from South Philly. There is a distinctive lack of travel from the norther portion of our study area.
bins1 <- c(0, 30000, 40000, 70000, 80000)
pal1 <- colorBin("viridis", domain = desire$medinc, bins = bins1)
map3 <- leaflet() %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(data = desire, color =~pal1(medinc) ,weight = 0.4)%>% addLegend(pal = pal1, values = desire$medinc, opacity = 0.7, title = "Median Income",
position = "bottomright")
map3
In order to plot the map above for median income distribution we created bins of income below $30000, $30000 to $40000, $40000 to $70000 and greater than $70000. Since the sampling is random rather than a collected data set therefore the plots do not show the actual picture of the travelling patters. But this plot is a demonstration of how travel patters can be observed in terms of median income and help us get a better understanding of the relationship of median income to travel behavior for different parts of the geography. For instance here in the plots we see that the share of people with median income between $40000 to $70000 contribute to large number of trips. Also, the more concentration of these trips are coming from west and north-west region of the study area.
bins2 <- c(0, 20, 45, 70)
pal2 <- colorBin("magma", domain = desire$pctwork, bins = bins2)
map4 <- leaflet() %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(data = desire, color =~pal2(pctwork) ,weight = 0.4)%>% addLegend(pal = pal2, values = desire$pctwork, opacity = 0.7, title = "Percent Working Pop",
position = "bottomright")
map4
For the percentage of working population and to understand their travel behavior we see that working population percentage between 20 to 45 have the highest share of the trips. Additionally, working population of less than 20 percent contribute the lowest. This makes sense, however this is a randomly sampled data.
In the maps below we can see that the desire lines for categorical variables like level of education and mode of commute, that were previously defined in Section 2. Despite lacking the statistical power that continuous variables have, these categorical variables are more tailored to the population that we are studying. Categorical data also gives us the ability to quickly recognize trends, changes and patterns based on inter-related variables, such as the ones we have used in this section. Some other categorical variables we would like to consider in a future study would be gender, age and race.
pal3 <- colorFactor(palette = c('red', 'yellow','black','blue'), domain = desire$edu)
map5 <- leaflet(desire) %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(color = ~pal3(edu) ,weight = 0.4)%>% addLegend(pal = pal3, values = ~edu, opacity = 0.7, title = "Education background",
position = "bottomright")
map5
pal4 <- colorFactor(palette = c('red', 'yellow','black','blue', 'green'), domain = desire$transit)
map6 <- leaflet(desire) %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(color = ~pal4(transit) ,weight = 0.4)%>% addLegend(pal = pal4, values = ~transit, opacity = 0.7, title = "Transit Choice",
position = "bottomright")
map6
In the maps below we can see that the desire lines for binary variables like whether the commuter uses an iPhone and whether the commuter takes public transit. The advantages of using binary variables to inform the analysis in this section is that it is easier to group users based on defining characteristics that could potentially further our study. The disadvantage is that a lot of nuance is lost if we decide to classify all our numeric variables into binaries. It works well in this case because we are asking simple ‘yes’ or ‘no’ questions.
pal5 <- colorFactor(palette = c('red', 'black'), domain = desire$iphone)
map7 <- leaflet(desire) %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(color = ~pal5(iphone) ,weight = 0.4)%>% addLegend(pal = pal5, values = ~iphone, opacity = 0.7, title = "Iphone ownership",
position = "bottomright")
map7
pal6 <- colorFactor(palette = c('blue', 'green'), domain = desire$vehicle)
map8 <- leaflet(desire) %>%
addProviderTiles(providers$CartoDB.Positron)%>%setView(-76.26386, 40.34927, zoom = 8)%>%
addPolylines(color = ~pal6(vehicle) ,weight = 0.4)%>% addLegend(pal = pal6, values = ~vehicle, opacity = 0.7, title = "Vehicle ownership",
position = "bottomright")
map8
From our findings we noticed that places like University City, where major anchor institutions like universities and hospitals are located- are major employment hubs in a city. It is only natural that a large number of trips seem to cluster in this area.
we believe that to further understand the travel patters and its distribution with different demographic layers overlapped could help provided better insights to the study. And therefore, we tried combining the data sets we think have significant correlations with travel behavior.
From our findings in this initial study, we can see that variables related to income, employment and mode of commute play an important role in defining O-D patterns in a city.
In conclusion we think that travel patterns play a significant role in understanding an area’s attractions and necessities inviting people from different surrounding regions.
Final project that we are working on is about passive data in transportation and we believe that this assignment helped us think about different variables that might affect the travel behavior and get more informed insights for planning process. And we believe that passive data techniques can help towards working in recording other different attributes which have not been previously collected on a more granular level, helping to study travel patterns better.